xen.git
16 years agolibxenlight: physmap slack for pv domains
Keir Fraser [Sat, 5 Dec 2009 12:30:46 +0000 (12:30 +0000)]
libxenlight: physmap slack for pv domains

Contemplate a memory space slack for PV domains,
since they do ballooning (or flipping network rx)
and need some extra room in their pfn space.

Note that this does not allocate any extra memory
to the domain, it simply extends the physmap with
some extra room for "bounce bufffering back" pfn's
that are yielded to dom0.

The default slack is set at 8MB.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Acked-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
16 years agoUpdate QEMU_TAG to 91ae19a7cc445030614bd0ae91548162cf0befbe
Keir Fraser [Sat, 5 Dec 2009 12:29:48 +0000 (12:29 +0000)]
Update QEMU_TAG to 91ae19a7cc445030614bd0ae91548162cf0befbe

16 years agolibxenlight: get state for one domain
Keir Fraser [Fri, 4 Dec 2009 07:11:44 +0000 (07:11 +0000)]
libxenlight: get state for one domain

Simple function to get the dominfo state of a single domain.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: domain resume
Keir Fraser [Fri, 4 Dec 2009 07:11:06 +0000 (07:11 +0000)]
libxenlight: domain resume

Added libxenlight implementation for resume domain.
This brings back a cooperative pv domain from the
shutdown state after save, enabling checkpointing.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: Destroy device model only for domains that have it
Keir Fraser [Fri, 4 Dec 2009 07:10:22 +0000 (07:10 +0000)]
libxenlight: Destroy device model only for domains that have it

Destroy device model only for domains that have it.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: avoid writing empty values to xenstore
Keir Fraser [Fri, 4 Dec 2009 07:09:44 +0000 (07:09 +0000)]
libxenlight: avoid writing empty values to xenstore

Prevent segmentation fault caused by empty values
in key-value pairs for the /vm/ subdirectory
when restoring a pv domain.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: disk and nic destroy calls
Keir Fraser [Fri, 4 Dec 2009 07:06:47 +0000 (07:06 +0000)]
libxenlight: disk and nic destroy calls

Expose disk and nic device destroy calls

Also removes the obsolete device shutdown calls.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: refactor libxl destroy code
Keir Fraser [Fri, 4 Dec 2009 07:03:45 +0000 (07:03 +0000)]
libxenlight: refactor libxl destroy code

Refactor libxl device destroy code. Abstract function
waiting for the watch on the state node to fire.
Create a generic device delete function.

Only a single LIBXL_DESTROY_TIMEOUT elapses when
waiting for destruction of all the devices of a
domain.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: fix GC when cloning contexts
Keir Fraser [Fri, 4 Dec 2009 07:02:49 +0000 (07:02 +0000)]
libxenlight: fix GC when cloning contexts

Provide a function to clone a context. This is necessary
because simply copying the structs will eventually
corrup the GC: maxsize is updated in the cloned context
but not in the originating, yet they have the same array
of referenced pointers alloc_ptrs.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agoxend: Fix parameters to PyArg_ParseTupleAndKeywords()
Keir Fraser [Fri, 4 Dec 2009 07:00:25 +0000 (07:00 +0000)]
xend: Fix parameters to PyArg_ParseTupleAndKeywords()

The kwd_list parameter PyArg_ParseTupleAndKeywords() must be a
NULL-terminated list.

Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
16 years agox86: XENMEM_add_to_physmap should propagate errors from guest_physmap_add_page().
Keir Fraser [Fri, 4 Dec 2009 06:59:33 +0000 (06:59 +0000)]
x86: XENMEM_add_to_physmap should propagate errors from guest_physmap_add_page().

Authored-by: David Lively
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
16 years agoAdd keyhandler 'g' to print all active grant table entries.
Keir Fraser [Fri, 4 Dec 2009 06:58:08 +0000 (06:58 +0000)]
Add keyhandler 'g' to print all active grant table entries.

Authored-By: Robert Phillips
Signed-off-By: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
16 years agolibxenlight: Get rid of the dependency on the LIBCONFIG_SOURCE directory.
Keir Fraser [Fri, 4 Dec 2009 06:51:53 +0000 (06:51 +0000)]
libxenlight: Get rid of the dependency on the LIBCONFIG_SOURCE directory.

Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com>
16 years agolibxenlight: Delete dep files on 'make clean', and include them in Makefile rules.
Keir Fraser [Fri, 4 Dec 2009 06:50:46 +0000 (06:50 +0000)]
libxenlight: Delete dep files on 'make clean', and include them in Makefile rules.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agogrant-tables: do not fail attempts to GNTTABOP_set_version to the current version.
Keir Fraser [Thu, 3 Dec 2009 13:52:02 +0000 (13:52 +0000)]
grant-tables: do not fail attempts to GNTTABOP_set_version to the current version.
...even if there are active grants.

This triggers when checkpoint a guest which essentially resumes
without actually having gone through the suspend so the domain is
already latched to v2 inside Xen.

Also return the current actual version on success and failure. Not
terribly useful with only 2 options but is more robust to future
developments.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
16 years agoxend: Add GPL license stanza to MemoryPool.py
Keir Fraser [Thu, 3 Dec 2009 13:51:20 +0000 (13:51 +0000)]
xend: Add GPL license stanza to MemoryPool.py

Signed-off-by: James Song (Wei) <jsong@novell.com>
16 years agoRemus: fall back to xenstore if necessary
Keir Fraser [Thu, 3 Dec 2009 13:50:43 +0000 (13:50 +0000)]
Remus: fall back to xenstore if necessary

This is primarily for pvops until it gets a dedicated suspend
event channel.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agoRemus: fix shadow memory allocation, broken by 20558:4ed3b9b1de3f
Keir Fraser [Thu, 3 Dec 2009 13:50:14 +0000 (13:50 +0000)]
Remus: fix shadow memory allocation, broken by 20558:4ed3b9b1de3f

This approach is perhaps a little cleaner than directly calling
balloon.free.

Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agox86 hvm: fix up the unified HAP nested-pagefault handler.
Keir Fraser [Wed, 2 Dec 2009 18:46:14 +0000 (18:46 +0000)]
x86 hvm: fix up the unified HAP nested-pagefault handler.
A guest PFN may have been marked dirty and switched to p2m_ram_rw by
another CPU between the VMEXIT and lookup in this handler, so
we can't just check for p2m_ram_logdirty.  Also, handle_mmio
doesn't handle passthrough MMIO.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agoxentop: Allow full domain name display
Keir Fraser [Wed, 2 Dec 2009 18:43:28 +0000 (18:43 +0000)]
xentop: Allow full domain name display

Add a '-f' option to xentop to allow the full domain name to be
displayed. This is the original behavior which can cause the display
to be unaligned. Customers have requested this because only the
trailing characters of their domain names are unique and therefore
cannot be distinguished when the display is limited to a 10 character
width.

Signed-off-by: Charles Arnold <carnold@novell.com>
16 years agolibxenlight: fix multiple xenstore watches problem
Keir Fraser [Wed, 2 Dec 2009 18:42:36 +0000 (18:42 +0000)]
libxenlight: fix multiple xenstore watches problem

this patch fixes the multiple xenstore watches problem in libxenlight
opening a new xenstore connection to set and read temporary watches on
the device state nodes.  This way they don't interfere with other long
running watches.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxenlight: use watch and select in libxl_wait_for_device_model
Keir Fraser [Wed, 2 Dec 2009 18:42:03 +0000 (18:42 +0000)]
libxenlight: use watch and select in libxl_wait_for_device_model

This patch reimplements libxl_wait_for_device_model using a xenstore
watch and a select loop.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxenlight: fix dm_xenstore_record_pid
Keir Fraser [Wed, 2 Dec 2009 18:41:31 +0000 (18:41 +0000)]
libxenlight: fix dm_xenstore_record_pid

The function dm_xenstore_record_pid is executed by a child of the main
process and therefore shouldn't use the same xenstore connection:
currently it opens a new connection but still uses the old one.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoxenstat: Fixes for 20528:e6e3bf767d16 (stats for dom0 network bonding)
Keir Fraser [Wed, 2 Dec 2009 13:45:35 +0000 (13:45 +0000)]
xenstat: Fixes for 20528:e6e3bf767d16 (stats for dom0 network bonding)

In above c/s I introduced dom0 statistics for case we use network
bonding. The indentation was not good for xenstat C codebase and also
some modifications were done to the logic, mainly not using the parsed
variables we don't care about (as we care only about
{tx|rx}{bytes,packets,errs,drops} and no other variable from
/proc/net/dev) by passing NULLs to variables we don't care about. Also
dom0 statistics alteration was fixed to include {tx|rx}{drop,errs} for
dom0 (previous version of my patch was not having this code applied).

Signed-off-by: Michal Novotny <minovotn@redhat.com>
16 years agoxend, vt-d: do not reserve vtd_mem if iommu is not enabled
Keir Fraser [Wed, 2 Dec 2009 13:43:37 +0000 (13:43 +0000)]
xend, vt-d: do not reserve vtd_mem if iommu is not enabled

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agovmx: During task-switch, read instr-len VMCS field only when valid.
Keir Fraser [Wed, 2 Dec 2009 13:39:07 +0000 (13:39 +0000)]
vmx: During task-switch, read instr-len VMCS field only when valid.

Otherwise we can crash on the BUG_ON() in __get_instruction_length().

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoVT-d: Fix indentation to make log messages more readable in dmar.c
Keir Fraser [Wed, 2 Dec 2009 08:52:50 +0000 (08:52 +0000)]
VT-d: Fix indentation to make log messages more readable in dmar.c

Signed-off-by: Weidong Han <weidong.han@intel.com>
16 years agopci: Correct BDF format from B:D:F to B:D.F in log messages.
Keir Fraser [Wed, 2 Dec 2009 08:51:59 +0000 (08:51 +0000)]
pci: Correct BDF format from B:D:F to B:D.F in log messages.

Signed-off-by: Weidong Han <weidong.han@intel.com>
16 years agoxend: Memory pool for pv guest on systems with >128G memory
Keir Fraser [Wed, 2 Dec 2009 08:51:12 +0000 (08:51 +0000)]
xend: Memory pool for pv guest on systems with >128G memory

The main idea of this patch is:

1) The admin sets aside some memory below 128G for 32-bit paravirtual
domain creation (via dom0_mem=-<value> in kernel comand line).

2) The admin also explicitly states to the tools (i..e xend) how much
memory is supposed to be left untouched by 64-bit domains

3) If a 32-bit pv DomU gets created, no ballooning ought to be
necessary (since if it is, no guarantee can be made about the address
range of the memory ballooned out), and memory gets allocated from the
reserved range.

4) Upon 64-bit (or 32-bit HVM or HVM) DomU creation, the tools
determine the amount of memory to be ballooned out of Dom0 by adding
the amount needed for the new guest and the amount still in the
reserved pool (and then of course subtracting the total amount of
memory the hypervisor has available for guest use).

Signed-off-by: james song (wei) <jsong@novell.com>
16 years agoVT-d: get rid of hardcode in iommu_flush_cache_entry
Keir Fraser [Wed, 2 Dec 2009 08:48:36 +0000 (08:48 +0000)]
VT-d: get rid of hardcode in iommu_flush_cache_entry

Currently iommu_flush_cache_entry uses a fixed size 8 bytes to flush
cache. But it also needs to flush caches with different sizes,
e.g. struct root_entry is 16 bytes. This patch fixes the hardcode by
using a parameter "size" to flush caches with different sizes.

Signed-off-by: Weidong Han <weidong.han@intel.com>
16 years agoxm: fix message in OptionError deprecated since Python 2.6
Keir Fraser [Wed, 2 Dec 2009 08:47:49 +0000 (08:47 +0000)]
xm: fix message in OptionError deprecated since Python 2.6

BaseException.message has been deprecated since Python 2.6.  To
prevent DeprecationWarning from popping up over this pre-existing
attribute, use a new property that takes lookup precedence.

Signed-off-by: Wei Kong <weikong.cn@gmail.com>
16 years agodocs: new tsc_mode VM configuration option
Keir Fraser [Wed, 2 Dec 2009 08:46:47 +0000 (08:46 +0000)]
docs: new tsc_mode VM configuration option

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agoremus: Skip Linux-specific build components on other OSes
Keir Fraser [Wed, 2 Dec 2009 08:46:11 +0000 (08:46 +0000)]
remus: Skip Linux-specific build components on other OSes

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Brendan Cully <brendan@cs.ubc.ca>
16 years agolibxenlight: write stubdoms logs to file
Keir Fraser [Wed, 2 Dec 2009 08:45:16 +0000 (08:45 +0000)]
libxenlight: write stubdoms logs to file

It turns out that there is a better way to write stubdoms logs to file
than using libxl_console_attach: qemu is the one that provides the
console backend for stubdoms and qemu is able to redirect a serial to
file, so we can use this feature to make sure the first stubdom
console is always redirected to a logfile.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxenlight: two small fixes
Keir Fraser [Wed, 2 Dec 2009 08:44:40 +0000 (08:44 +0000)]
libxenlight: two small fixes

- set the domid of the guest and not the one of the stubdom in the
libxl_device_model_starting returned to the user;

- check that the length of the two strings matches in
libxl_name_to_domid, otherwise we can get a match for two different
domains that have the same initial part of the name.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxl: include signal.h, required for SIGKILL definition
Keir Fraser [Wed, 2 Dec 2009 08:44:10 +0000 (08:44 +0000)]
libxl: include signal.h, required for SIGKILL definition

...makes libxl build on NetBSD.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agox86: Correctly allocate module-relocation area and bzimage headroom.
Keir Fraser [Tue, 1 Dec 2009 14:19:28 +0000 (14:19 +0000)]
x86: Correctly allocate module-relocation area and bzimage headroom.

Without this patch, loading a bzimage dom0 kernel while also
requesting a dynamically-allocated crashkernel area is broken.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agohvmloader: Fix bug in 20510:749b5d46e7a9 (GPE notifications)
Keir Fraser [Tue, 1 Dec 2009 14:08:27 +0000 (14:08 +0000)]
hvmloader: Fix bug in 20510:749b5d46e7a9 (GPE notifications)

The GPE notification decision tree was inverted.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agolibxenlight: wait for pv qemu initialization
Keir Fraser [Tue, 1 Dec 2009 14:03:42 +0000 (14:03 +0000)]
libxenlight: wait for pv qemu initialization

this patch makes libxl_create_stubdom wait for pv qemu to be properly
initialized before unpausing the stubdom.
A new libxl_device_model_starting pointer is used to wait for pv qemu
initialization while the libxl_device_model_starting pointer given by
the user is initialized to a new structure with an empty for_spawn
member, because nothing that was spawn has to be waited for anymore.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agox86: fix MCE/NMI injection
Keir Fraser [Tue, 1 Dec 2009 14:02:00 +0000 (14:02 +0000)]
x86: fix MCE/NMI injection

This attempts to address all the concerns raised in
http://lists.xensource.com/archives/html/xen-devel/2009-11/msg01195.html,
but I'm nevertheless still not convinced that all aspects of the
injection handling really work reliably. In particular, while the
patch here on top of the fixes for the problems menioned in the
referenced mail also adds code to keep send_guest_trap() from
injecting multiple events at a time, I don't think the is the right
mechanism - it should be possible to handle NMI/MCE nested within
each other.

Another fix on top of the ones for the earlier described problems is
that the vCPU affinity restore logic didn't account for software
injected NMIs - these never set cpu_affinity_tmp, but due to it most
likely being different from cpu_affinity it would have got restored
(to a potentially random value) nevertheless.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoxen: turn numa=on by default
Keir Fraser [Tue, 1 Dec 2009 13:59:47 +0000 (13:59 +0000)]
xen: turn numa=on by default

I did some benchmark runs (lmbench & kernel compile) with a number of
guests running in parallel to compare the performance of numa=on vs.
numa=off.  As soon as one starts to load the machine, the performance
goes down in the numa=off case.  The tests were done on an 8-node
machine (4 cores each).  lmbench (actually copying large amounts of
memory) shows a dramatic dropdown, but I even noticed significant
performance decrease for a tmpfs based Linux kernel compile. Here a
summary of the data:

lmbench's rd benchmark (normalized to native Linux (=100)):
guests    numa=off       numa=on       avg increase
        min  avg  max  min  avg  max
     1       78.0           102.3
     7  37.4 45.6 62.0 90.6 102.3 110.9 124.4%
    15  21.0 25.8 31.7 41.7 48.7 54.1    88.2%
    23  13.4 17.5 23.2 25.0 28.0 30.1    60.2%

kernel compile in tmpfs, 1 VCPU, 2GB RAM, average of elapsed time:
guests    numa=off   numa=on   increase
      1    480.610    464.320    3.4%
      7    482.109    461.721    4.2%
     15    515.297    477.669    7.3%
     23    548.427    495.180    9.7%
again with 2 VCPUs and make -j2:
      1    264.580    261.690    1.1%
      7    279.763    258.907    7.7%
     15    330.385    272.762   17.4%
     23    463.510    390.547   15.7% (46 VCPUs on 32pCPUs)

Selected tests on a 4-node machine showed similar behavior (7.9 %
increase with 6 parallel guests on the 2 VCPU kernel compile
benchmark).

Note that this does not affect non-NUMA machines at all, since NUMA
will be turned off again by the code if no NUMA topology is detected.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
16 years agolibxc: pass the restore_context through function and allocate the context on the...
Keir Fraser [Tue, 1 Dec 2009 13:57:02 +0000 (13:57 +0000)]
libxc: pass the restore_context through function and allocate the context on the restore function stack.

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
16 years agolibxc: pass the suspend_context through function and allocate the context on the...
Keir Fraser [Tue, 1 Dec 2009 13:56:26 +0000 (13:56 +0000)]
libxc: pass the suspend_context through function and allocate the context on the save function stack.

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
16 years agolibxc: move the domain_info_context into the restore_context
Keir Fraser [Tue, 1 Dec 2009 13:55:50 +0000 (13:55 +0000)]
libxc: move the domain_info_context into the restore_context

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
16 years agolibxc: move domain_info_context into the save_context
Keir Fraser [Tue, 1 Dec 2009 13:55:15 +0000 (13:55 +0000)]
libxc: move domain_info_context into the save_context

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
16 years agolibxc: move restore global variable to a global static context
Keir Fraser [Tue, 1 Dec 2009 13:54:36 +0000 (13:54 +0000)]
libxc: move restore global variable to a global static context

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
16 years agolibxc: create a global context structure to record global variables in save
Keir Fraser [Tue, 1 Dec 2009 13:54:01 +0000 (13:54 +0000)]
libxc: create a global context structure to record global variables in save

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
16 years agolibxc: create a domain_info_context structure to store guest_width and p2m_size for...
Keir Fraser [Tue, 1 Dec 2009 13:53:14 +0000 (13:53 +0000)]
libxc: create a domain_info_context structure to store guest_width and p2m_size for macros.

Macro now refers to guest_width and p2m_size through a dinfo pointer.

Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
16 years agolibxenlight: enables less than maximum vcpus
Keir Fraser [Tue, 1 Dec 2009 13:49:33 +0000 (13:49 +0000)]
libxenlight: enables less than maximum vcpus

Enable turning on a different amount of vcpus than
the maximum during domain creation/restore.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: allow domain to publish its suspend evtchn
Keir Fraser [Tue, 1 Dec 2009 13:48:48 +0000 (13:48 +0000)]
libxenlight: allow domain to publish its suspend evtchn

Allow domain to publish its suspend event channel.
Otherwise, the fast event-channel-based suspend
path is disabled.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: write vcpu availability paths in xenstore
Keir Fraser [Tue, 1 Dec 2009 13:48:03 +0000 (13:48 +0000)]
libxenlight: write vcpu availability paths in xenstore

Write cpu availability paths to xenstore. Otherwise,
no vcpus other than the first are enabled.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: remove vss and xapi patch on domain destroy
Keir Fraser [Tue, 1 Dec 2009 13:47:18 +0000 (13:47 +0000)]
libxenlight: remove vss and xapi patch on domain destroy

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: set domain handle
Keir Fraser [Tue, 1 Dec 2009 13:46:31 +0000 (13:46 +0000)]
libxenlight: set domain handle

Set domain handle much like xend does, identical to
the uuid. This allows obtaining the uuid of a domain
from the handle in the dominfo struct.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: fix uuid code
Keir Fraser [Tue, 1 Dec 2009 13:45:45 +0000 (13:45 +0000)]
libxenlight: fix uuid code

- Use proper constants
- Use functions from the uuid library
- Fix broken pointer handling in libxl_dominfo

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agolibxenlight: avoid writing empty values to xenstore
Keir Fraser [Tue, 1 Dec 2009 13:44:13 +0000 (13:44 +0000)]
libxenlight: avoid writing empty values to xenstore

Prevent segmentation fault caused by empty values
in key-value pairs for the /vm/ subdirectory
when creating a pv domain.

Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
16 years agosysctl: Fix mis-allocation of number for XEN_SYSCTL_lockprof_op
Keir Fraser [Tue, 1 Dec 2009 13:41:38 +0000 (13:41 +0000)]
sysctl: Fix mis-allocation of number for XEN_SYSCTL_lockprof_op

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoRevert 20523:bd52fff29e6e "Remove redundant tests in __start_xen()"
Keir Fraser [Tue, 1 Dec 2009 13:39:51 +0000 (13:39 +0000)]
Revert 20523:bd52fff29e6e "Remove redundant tests in __start_xen()"

Consensus is that code is clearer with the tests, even though they are
redundant.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxentop: Add tmem-freeable info when tmem is active
Keir Fraser [Tue, 1 Dec 2009 13:38:18 +0000 (13:38 +0000)]
xentop: Add tmem-freeable info when tmem is active

(No change to xentop output when tmem is inactive.)

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agoxenstat: Linux dom0 statistics for case we use network bonding
Keir Fraser [Tue, 1 Dec 2009 13:37:20 +0000 (13:37 +0000)]
xenstat: Linux dom0 statistics for case we use network bonding

I've created a patch that alters dom0 statistics (if empty like in
case of network bonding) and puts network bridge statistics
instead. It's been tested with network bonding both enabled and
disabled and also by creating a standalone network bridge without
bonding... It was working fine in all my tests...

Signed-off-by: Michal Novotny <minovotn@redhat.com>
16 years agoReport hardware tsc frequency even for emulated tsc
Keir Fraser [Tue, 1 Dec 2009 13:36:22 +0000 (13:36 +0000)]
Report hardware tsc frequency even for emulated tsc

I was starting some documentation for tsc_mode and
realized this discussion was never resolved.  Currently
when TSC is emulated the pvclock algorithm reports
to a PV OS Xen's system clock hz rate (1GHz).  Linux
at boottime samples the TSC rate and shows it in
dmesg and the rate is also shown in the "cpu MHz"
field in /proc/cpuinfo.  So when TSC is emulated,
it appears that the processor MHz is 1000.0, which
is likely to be confusing to many Xen users.

This patch changes the reported hz rate to the
hz rate of the initial machine on which the guest
is booted and retains that reported hz rate across
save/restore/migration.

Jeremy has pointed out that reporting 1000.0 MHz is
useful because it shows that TSC is being emulated.
However, with the new tsc_mode default where
a guest may start with native TSC and switch to
emulated TSC after migration, users are likely to
get even more confused.  And "xm debug-key s"
reveals not only whether TSC is being emulated but
also the frequency so is more descriptive anyway.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agotools: avoid cpu over-commitment if numa=on
Keir Fraser [Tue, 1 Dec 2009 13:35:28 +0000 (13:35 +0000)]
tools: avoid cpu over-commitment if numa=on

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
16 years agolibxenlight: fix segfault when reading blktap2 devs
Keir Fraser [Tue, 1 Dec 2009 13:34:38 +0000 (13:34 +0000)]
libxenlight: fix segfault when reading blktap2 devs

This patch fixes a possible segfault when reading from
/sys/class/blktap2/devices, if the line read is empty.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxenlight: fix multiple console with stubdoms
Keir Fraser [Tue, 1 Dec 2009 13:34:10 +0000 (13:34 +0000)]
libxenlight: fix multiple console with stubdoms

libxenlight doesn't handle properly the multiple pv console case,
needed to support an emulated serial in hvm guests with stubdoms.
This patch fixes it.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agox86: Remove redundant tests in __start_xen()
Keir Fraser [Mon, 30 Nov 2009 11:48:36 +0000 (11:48 +0000)]
x86: Remove redundant tests in __start_xen()

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
16 years agoia64: eliminate build warnings
Keir Fraser [Mon, 30 Nov 2009 10:58:23 +0000 (10:58 +0000)]
ia64: eliminate build warnings

Various warnings appeared since 3.4 - eliminate at least some of them.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agoxend: fix bugs in c/s 20321:7a69f773548e "add a config description item for each...
Keir Fraser [Mon, 30 Nov 2009 10:57:42 +0000 (10:57 +0000)]
xend: fix bugs in c/s 20321:7a69f773548e "add a config description item for each guest"

Signed-off-by: james song (wei)<jsong@novell.com>
16 years agolibxenlight: implement blktap2 support
Keir Fraser [Mon, 30 Nov 2009 10:54:20 +0000 (10:54 +0000)]
libxenlight: implement blktap2 support

This patch implements blktap2 support in libxenlight; blktap2 is only
enabled if it is actually supported by the host, otherwise we fall
back to the previous code. Also for the moment we pretend that disk
type file is actually tap:aio.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxenlight: fix suspend/resume
Keir Fraser [Mon, 30 Nov 2009 10:53:39 +0000 (10:53 +0000)]
libxenlight: fix suspend/resume

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxenlight: add console command
Keir Fraser [Mon, 30 Nov 2009 10:47:36 +0000 (10:47 +0000)]
libxenlight: add console command

This patch adds "xl console" command similar to "xm console".

Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxenlight: fix hvm flag when no hvmloader
Keir Fraser [Mon, 30 Nov 2009 10:41:28 +0000 (10:41 +0000)]
libxenlight: fix hvm flag when no hvmloader

Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agox86/mm: set_p2m_entry() should return 0 on error
Keir Fraser [Mon, 30 Nov 2009 10:38:58 +0000 (10:38 +0000)]
x86/mm: set_p2m_entry() should return 0 on error

set_p2m_entry() ignores halfway errors.
It should return 0 on error.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agoxm: Allow detaching vif by MAC address
Keir Fraser [Fri, 27 Nov 2009 08:09:26 +0000 (08:09 +0000)]
xm: Allow detaching vif by MAC address

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agoVT-d: Free unused interrupt remapping table entry
Keir Fraser [Fri, 27 Nov 2009 08:05:18 +0000 (08:05 +0000)]
VT-d: Free unused interrupt remapping table entry

This patch changes the IRTE allocation method, and frees unused
IRTE when device is de-assigned.

Signed-Off-By: Zhai Edwin <edwin.zhai@intel.com>
16 years agobuild: Execute mk_dsdt with path
Keir Fraser [Fri, 27 Nov 2009 07:56:38 +0000 (07:56 +0000)]
build: Execute mk_dsdt with path

Signed-off-by: Simon Horman <horms@verge.net.au>
16 years agohvmloader: Auto-generate IRQ routing tables in ACPI DSDT.
Keir Fraser [Thu, 26 Nov 2009 15:27:00 +0000 (15:27 +0000)]
hvmloader: Auto-generate IRQ routing tables in ACPI DSDT.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agolibxenlight: implement pause and unpause
Keir Fraser [Thu, 26 Nov 2009 14:49:40 +0000 (14:49 +0000)]
libxenlight: implement pause and unpause

this patch adds domain pause and unpause commands to xl, implementing
them using the already exiting functions libxl_domain_pause and
libxl_domain_unpause.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agohvmloader: Auto-generate the lengthy pattern-based sections of ACPI DSDT.
Keir Fraser [Thu, 26 Nov 2009 13:51:16 +0000 (13:51 +0000)]
hvmloader: Auto-generate the lengthy pattern-based sections of ACPI DSDT.

At the same time, replace a lengthy linear GPE notification method,
with a logarithmic binary chop. Based on a patch by Simon Horman.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: Remove redundant logic for mp_register gsi.
Keir Fraser [Thu, 26 Nov 2009 11:35:27 +0000 (11:35 +0000)]
x86: Remove redundant logic for mp_register gsi.

For xen's irq and gsi, they are identity mapped, and doesn't
need to record the irq and gsi mapping in this array, in addition
the mapping maybe not correct, since dom0 may not figure the GSI
from 16 on.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agox86 shadow: don't try to unsshadow for p2m changes after the shadows
Keir Fraser [Thu, 26 Nov 2009 11:31:16 +0000 (11:31 +0000)]
x86 shadow: don't try to unsshadow for p2m changes after the shadows
have been torn down.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agoRevert 20505:44ea369eefc1
Keir Fraser [Thu, 26 Nov 2009 11:30:42 +0000 (11:30 +0000)]
Revert 20505:44ea369eefc1

16 years agox86: Always respect guest setting CR4.TSD
Keir Fraser [Thu, 26 Nov 2009 11:24:50 +0000 (11:24 +0000)]
x86: Always respect guest setting CR4.TSD

Also fix guest reads of CR4.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 shadow: fix race when domain is dying
Keir Fraser [Thu, 26 Nov 2009 11:02:30 +0000 (11:02 +0000)]
x86 shadow: fix race when domain is dying

There are some cases that shadow_write_p2m_entry() is called after
the domain is killed. It causes Xen to crash.

- Race between xc_map_foreign_batch from qemu-dm and "xm destroy"
  command.
- The hypervisor calls domain_crash when PoD fails.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
16 years agoImplement rdtscp emulation and rdtscp_aux "support"
Keir Fraser [Thu, 26 Nov 2009 11:00:49 +0000 (11:00 +0000)]
Implement rdtscp emulation and rdtscp_aux "support"

The rdtscp instruction (and the associated TSC_AUX
msr) are present on most recent AMD processors,
and on the Nehalem and future Intel processors.
Cpuid has a bit to detect the presence of this feature.

Xen intentionally does not expose the cpuid rdtscp bit
to PV OS's or to HVM guests, but PV apps can see this
bit and, as a result, may choose to use the rdtscp
instruction.  When a PV guest with such an app is migrated
to a machine that does not have rdtscp support, the
app will get killed due to an invalid op.  Fix this
by emulating the rdtscp instruction.  We also need
to emulate rdtscp in the case where the machine has
rdtscp support, but rdtsc emulation is enabled (which
is unfortunately a different path: a privileged op).

The rdtscp instruction reads the TSC_AUX register which
presumably is set by the OS (and, in the case of
tsc_mode==pvrdtscp, will be set by Xen).  HV Linux
and PV Linux will not set TSC_AUX because the
cpuid rdtscp bit is not propogated by Xen; I'm told that
Windows always sets TSC_AUX to zero.  So for PV guests
running on rdtscp-capable hardware (that don't use
tsc_mode==pvrdtscp), always set TSC_AUX to zero.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agolibxc: Fix 32-vs-64 bitness issue in saving vcpu contexts in core dump
Keir Fraser [Thu, 26 Nov 2009 11:00:15 +0000 (11:00 +0000)]
libxc: Fix 32-vs-64 bitness issue in saving vcpu contexts in core dump

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxm: Fix maxvcpus support
Keir Fraser [Thu, 26 Nov 2009 10:57:26 +0000 (10:57 +0000)]
xm: Fix maxvcpus support

Signed-off-by: Michal Novotny <minovotn@redhat.com>
16 years agoxend: little fix for tap
Keir Fraser [Thu, 26 Nov 2009 10:56:49 +0000 (10:56 +0000)]
xend: little fix for tap

Need get dev type after create tap device as device_create did.

Signed-off-by: Wei Kong <weikong.cn@gmail.com>
16 years agolibxenlight: move logging macros to the public header
Keir Fraser [Wed, 25 Nov 2009 14:19:50 +0000 (14:19 +0000)]
libxenlight: move logging macros to the public header

This patch moves the logging macros to the public header so that they
can be reused by the client of the library.  It also refactors the
code to create the qemu logfile into a generic function that can be
reused to create generic xen logfiles under /var/log/xen.  Finally xl
is changed to log to file when running in background.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agolibxenlight: clean up the domain when it dies
Keir Fraser [Wed, 25 Nov 2009 14:19:20 +0000 (14:19 +0000)]
libxenlight: clean up the domain when it dies

This patch adds two functions to libxenlight to be able to recognize
when a particular domain dies. After creating a domain, xl uses these
functions to wait for its death and clean up its resources.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agox86 time: Fix build and clean up.
Keir Fraser [Wed, 25 Nov 2009 14:15:57 +0000 (14:15 +0000)]
x86 time: Fix build and clean up.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 hpet: Do nothing in hpet_broadcast_exit() if no timer deadline.
Keir Fraser [Wed, 25 Nov 2009 14:12:58 +0000 (14:12 +0000)]
x86 hpet: Do nothing in hpet_broadcast_exit() if no timer deadline.

From: "Jiang, Yunhong" <yunhong.jiang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agolibxenlight: implement stubdom support
Keir Fraser [Wed, 25 Nov 2009 14:11:37 +0000 (14:11 +0000)]
libxenlight: implement stubdom support

this patch implements stubdom support for libxenlight:

- it adds two functions to find the stubdom domid of a domain and to
figure out if a certain domain is actually a stubdom;

- it moves all the device init functions from xl.c to libxl.c because
they are needed to setup the devices of stubdoms;

- it fixes some bugs in the pci setup that prevented pci passthrough
from working correctly with stubdoms.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoxm: Add maxvcpus support
Keir Fraser [Wed, 25 Nov 2009 14:11:02 +0000 (14:11 +0000)]
xm: Add maxvcpus support

this is patch to add maxvcpus support to xen xm command. It's using
vcpu_avail bitmask and sets the number of vcpus to maxvcpus if
present.  If it's not present, old behavior is preserved.

In domain config file you can define it as follows:

maxvcpus = 4
vcpus = 2

this automatically sets vcpus to 4 and corresponding bitmask to
present 2 vcpus in the guest with option to increase it up to 4
vcpus. If maxvcpus is not present, the old behavior for vcpus is
preserved, ie.  you can set vcpus to some number of vcpus to be used
and the vcpu_avail is set appropriately to use all of them. Only when
you use maxvcpus and vcpus new vcpu_avail value is calculated to show
PV guest the desired number of vcpus only.

It's been tested using RHEL-5 32-bit PV guest with maxvcpus = 4 and
vcpus = 2 and also the previous setup of vcpus = 2 only... In both
cases I was able to use 'xm vcpu-set {domainId} {numberOfVCPUs}' to
increase move vcpu count from 0 to maxvcpus/vcpus so it was working as
designed.

Signed-off-By: Michal Novotny<minovotn@redhat.com>
16 years agocpuidle: Add decaying history logic to menu idle predictor
Keir Fraser [Wed, 25 Nov 2009 14:06:17 +0000 (14:06 +0000)]
cpuidle: Add decaying history logic to menu idle predictor

this patch is ported from linux upstream git commit
816bb611e41be29b476dc16f6297eb551bf4d747

the original description is:
"
Add decaying history of predicted idle time, instead of using the last
early wakeup. This logic helps menu governor do better job of
predicting idle time.

With this change, we also measured noticable (~8%) power savings on a
DP server system with CPUs supporting deep C states, when system was
lightly loaded. There was no change to power or perf on other load
conditions.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
"

In Xen environment, we also observe this patch reduce the idle power
fluctuation.  In one DP server, when system is purely idle, the watts
stdev/average reduce from 6% to 2%. it is helpful for idle power
measurement accuracy.  There is no performance and power change when
system is loaded.

Signed-off-by: Yu Ke <ke.yu@intel.com>
16 years agoReplace tsc_native config option with tsc_mode config option
Keir Fraser [Wed, 25 Nov 2009 14:05:28 +0000 (14:05 +0000)]
Replace tsc_native config option with tsc_mode config option

(NOTE: pvrdtscp mode not finished yet, but all other
modes have been tested so sooner seemed better than
later to submit this fairly major patch so we can get
more mileage on it before next release.)

New tsc_mode config option supercedes tsc_native and
offers a more intelligent default and an additional
option for intelligent apps running on PV domains
("pvrdtscp").

For PV domains, default mode will determine if the initial
host has a "safe"** TSC (meaning it is always synchronized
across all physical CPUs).  If so, all domains will
execute all rdtsc instructions natively; if not,
all domains will emulate all rdtsc instructions but
providing the TSC hertz rate of the initial machine.
After being restored or live-migrated, all PV domains will
emulate all rdtsc instructions.  Hence, this default mode
guarantees correctness while providing native performance
in most conditions.

For PV domains, tsc_mode==1 will always emulate rdtsc
and tsc_mode==2 will never emulate rdtsc.  For tsc_mode==3,
rdtsc will never be emulated, but information is provided
through pvcpuid instructions and rdtscp instructions
so that an app can obtain "safe" pvclock-like TSC information
across save/restore and live migration. (Will be completed in
a follow-on patch.)

For HVM domains, the default mode and "always emulate"
mode do the same as tsc_native==0; the other two modes
do the same as tsc_native==1.  (HVM domains since 3.4
have implemented a tsc_mode=default-like functionality,
but also can preserve native TSC across save/restore
and live-migration IFF the initial and target machines
have a common TSC cycle rate.)

** All newer AMD machines, and Nehalem and future Intel
machines have "Invariant TSC"; many newer Intel machines
have "Constant TSC" and do not support deep-C sleep states;
these and all single-processor machines are "safe".

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agohvmloader: Advertise ECC memory in SMBIOS tables.
Keir Fraser [Wed, 25 Nov 2009 14:04:46 +0000 (14:04 +0000)]
hvmloader: Advertise ECC memory in SMBIOS tables.

Microsoft's Windows logo certified hardware requires ECC; since the
SVVP certification runs the same test on the guest, Xen domains will
currently fail it.

From: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: Add a new physdev_op PHYSDEVOP_setup_gsi for GSI setup.
Keir Fraser [Tue, 24 Nov 2009 14:43:07 +0000 (14:43 +0000)]
x86: Add a new physdev_op PHYSDEVOP_setup_gsi for GSI setup.

GSI 0-15 is setup by hypervisor, and GSI > =16 is setup by dom0
this physdev_op PHYSDEVOP_setup_gsi. This patch can help dom0
to get rid of intrusive changes of ioapic.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agotmem: fix freeable memory accounting error
Keir Fraser [Tue, 24 Nov 2009 14:38:37 +0000 (14:38 +0000)]
tmem: fix freeable memory accounting error

Fix tmem accounting error that causes an "apparent"
memory leak, creating false negatives when testing
memory availability for launching a new domain.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agotmem: Fix another race in tmem on domain destroy.
Keir Fraser [Tue, 24 Nov 2009 14:37:59 +0000 (14:37 +0000)]
tmem: Fix another race in tmem on domain destroy.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agoRevert 20457:1bbc132675a2
Keir Fraser [Mon, 23 Nov 2009 15:19:38 +0000 (15:19 +0000)]
Revert 20457:1bbc132675a2

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agopygrub: add basic support for parsing grub2 style grub.cfg file
Keir Fraser [Mon, 23 Nov 2009 08:06:54 +0000 (08:06 +0000)]
pygrub: add basic support for parsing grub2 style grub.cfg file

This represents a very simplistic aproach to parsing these file.  It
is basically sufficient to parse the files produced by Debian
Squeeze's version of update-grub. The actual grub.cfg syntax is much
more expresive but not apparently documented apart from a few
examples...

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>